Active Sampling of Pairs and Points for Large-scale Linear Bipartite Ranking
نویسندگان
چکیده
Bipartite ranking is a fundamental ranking problem that learns to order relevant instances ahead of irrelevant ones. One major approach for bipartite ranking, called the pair-wise approach, tackles an equivalent binary classification problem of whether one instance out of a pair of instances should be ranked higher than the other. Nevertheless, the number of instance pairs constructed from the input data could be quadratic to the size of the input data, which makes pair-wise ranking generally infeasible on large-scale data sets. Another major approach for bipartite ranking, called the point-wise approach, directly solves a binary classification problem between relevant and irrelevant instance points. This approach is feasible for large-scale data sets, but the resulting ranking performance can be inferior. That is, it is difficult to conduct bipartite ranking accurately and efficiently at the same time. In this paper, we develop a novel scheme within the pair-wise approach to conduct bipartite ranking efficiently. The scheme, called Active Sampling, is inspired from the rich field of active learning and can reach a competitive ranking performance while focusing only on a small subset of the many pairs during training. Moreover, we propose a general Combined Ranking and Classification (CRC) framework to accurately conduct bipartite ranking. The framework unifies point-wise and pair-wise approaches and is simply based on the idea of treating each instance point as a pseudo-pair. Experiments on 14 realword large-scale data sets demonstrate that the proposed algorithm of Active Sampling within CRC, when coupled with a linear Support Vector Machine, usually outperforms state-of-the-art point-wise and pair-wise ranking approaches in terms of both accuracy and efficiency.
منابع مشابه
Confidence-Weighted Bipartite Ranking
Bipartite ranking is a fundamental machine learning and data mining problem. It commonly concerns the maximization of the AUC metric. Recently, a number of studies have proposed online bipartite ranking algorithms to learn from massive streams of class-imbalanced data. These methods suggest both linear and kernel-based bipartite ranking algorithms based on first and second-order online learning...
متن کاملEfficient Sampling for Bipartite Matching Problems
Bipartite matching problems characterize many situations, ranging from ranking in information retrieval to correspondence in vision. Exact inference in realworld applications of these problems is intractable, making efficient approximation methods essential for learning and inference. In this paper we propose a novel sequential matching sampler based on a generalization of the PlackettLuce mode...
متن کاملLarge Scale Learning to Rank
Pairwise learning to rank methods such as RankSVM give good performance, but suffer from the computational burden of optimizing an objective defined over O(n) possible pairs for data sets with n examples. In this paper, we remove this super-linear dependence on training set size by sampling pairs from an implicit pairwise expansion and applying efficient stochastic gradient descent learners for...
متن کاملLarge Scale Co-Regularized Ranking
As unlabeled data is usually easy to collect, semisupervised learning algorithms that can be trained on large amounts of unlabeled and labeled data are becoming increasingly popular for ranking and preference learning problems [6, 23, 8, 21]. However, the computational complexity of the vast majority of these (pairwise) ranking and preference learning methods is super-linear, as optimizing an o...
متن کاملSolving fully fuzzy Linear Programming Problem using Breaking Points
Abstract In this paper we have investigated a fuzzy linear programming problem with fuzzy quantities which are LR triangular fuzzy numbers. The given linear programming problem is rearranged according to the satisfactory level of constraints using breaking point method. By considering the constraints, the arranged problem has been investigated for all optimal solutions connected with satisf...
متن کامل